Add NPS benchmark for search speed regression testing#46
Conversation
Add a node counter to AlphaBeta and a bench mode that searches 48 positions from Stockfish's bench suite, reporting per-position and total nodes, time, and NPS. Node count is deterministic and serves as the primary signal for detecting search behavior changes. Includes a CI workflow that runs on PRs and posts results as a comment.
Greptile SummaryAdds a deterministic NPS benchmark suite for regression testing search speed. A
Confidence Score: 4/5
|
| Filename | Overview |
|---|---|
| moonfish/engines/alpha_beta.py | Adds self.nodes counter initialized to 0, incremented in both negamax() and quiescence_search(), and reset in search_move(). Clean, minimal change with no impact on search behavior. |
| moonfish/bench.py | New benchmark module with 48 Stockfish bench positions. Seeds RNG for deterministic node counts. Correctly handles terminal positions and reports per-position and total NPS. |
| moonfish/main.py | Adds bench mode to CLI. However, run_bench(depth=5) hardcodes the depth, silently ignoring the --depth CLI parameter. |
| .github/workflows/nps-benchmark.yml | CI workflow runs bench on PRs that touch engine code, parses output, and posts results as a PR comment. Permissions are appropriately scoped. Output parsing relies on consistent print format from bench.py. |
Sequence Diagram
sequenceDiagram
participant CLI as CLI (main.py)
participant Bench as bench.py
participant Engine as AlphaBeta
participant QSearch as quiescence_search
CLI->>Bench: run_bench(depth)
Bench->>Bench: random.seed(0)
loop 48 positions
Bench->>Bench: _make_board(position)
Bench->>Engine: search_move(board)
Engine->>Engine: self.nodes = 0
Engine->>Engine: negamax(board, depth)
Engine->>Engine: self.nodes += 1
Engine->>QSearch: quiescence_search(board)
QSearch->>QSearch: self.nodes += 1
Engine-->>Bench: best_move
Bench->>Bench: read engine.nodes, accumulate totals
end
Bench->>CLI: print NPS results
Last reviewed commit: c5bbecd
| elif config.mode == "bench": | ||
| run_bench(depth=5) |
There was a problem hiding this comment.
--depth CLI flag silently ignored in bench mode
run_bench(depth=5) hardcodes depth to 5, ignoring the --depth value passed via the CLI and stored in config.negamax_depth. A user running moonfish --mode bench --depth 3 would still get depth 5.
The CI workflow also passes --depth 5 which currently has no effect since the value is hardcoded here.
| elif config.mode == "bench": | |
| run_bench(depth=5) | |
| elif config.mode == "bench": | |
| run_bench(depth=config.negamax_depth) |
🔬 Stockfish Benchmark Resultsvs Stockfish Skill Level 3
Non-checkmate endings:
vs Stockfish Skill Level 4
Non-checkmate endings:
vs Stockfish Skill Level 5
Non-checkmate endings:
Configuration
|
⚡ NPS Benchmark Results
Per-position breakdown |
🔬 Stockfish Benchmark Resultsvs Stockfish Skill Level 3
Non-checkmate endings:
vs Stockfish Skill Level 4
Non-checkmate endings:
vs Stockfish Skill Level 5
Non-checkmate endings:
Configuration
|
⚡ NPS Benchmark Results
Per-position breakdown |
⚡ NPS Benchmark Results
Per-position breakdown |
🔬 Stockfish Benchmark Resultsvs Stockfish Skill Level 3
Non-checkmate endings:
vs Stockfish Skill Level 4
Non-checkmate endings:
vs Stockfish Skill Level 5
Non-checkmate endings:
Configuration
|
Summary
self.nodes) toAlphaBeta, incremented innegamax()andquiescence_search()moonfish/bench.pywith 48 positions from Stockfish's bench suite and arun_bench()function that reports per-position and total nodes, time, and NPS--mode benchto the CLI (moonfish --mode bench).github/workflows/nps-benchmark.yml) that runs on PRs and posts results as a PR commentNode count is deterministic (RNG is seeded) and serves as the primary signal — if it changes, the PR changed search behavior. NPS is informational only since CI runner performance varies.
Test plan
moonfish --mode benchruns all 48 positions and prints NPS resultstest_alpha_betatests pass (node counter doesn't break anything)